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Abstract 

We consider linear precoding and decoding in the downlink of a multiuser multiple-input, multiple- 
output (MIMO) system, wherein each user may receive more than one data stream. We propose 
several mean squared error (MSE) based criteria for joint transmit-receive optimization and establish a 
series of relationships linking these criteria to the signal-to-interference-plus-noise ratios of individual 
data streams and the information theoretic channel capacity under linear minimum MSE decoding. 
In particular, we show that achieving the maximum sum throughput is equivalent to minimizing the 
product of MSE matrix determinants (PDetMSE). Since the PDetMSE minimization problem does not 
admit a computationally efficient solution, a simplified scalar version of the problem is considered that 
minimizes the product of mean squared errors (PMSE). An iterative algorithm is proposed to solve the 
PMSE problem, and is shown to provide near-optimal performance with greatly reduced computational 
complexity. Our simulations compare the achievable sum rates under linear precoding strategies to the 
sum capacity for the broadcast channel. 

I. Introduction 

The benefits of using multiple antennas for wireless communication systems are well known. 
When antenna arrays are present at the transmitter and/or receiver, multiple-input multiple-output 
(MIMO) techniques can utilize the spatial dimension to yield improved reliability, increased data 
rates, and the spatial separation of users. In this paper, the methods we propose will focus on 
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exploiting all of these features, with the goal of maximizing the sum data rate achieved in the 
MIMO multiuser downlink. 

The optimal strategy for maximizing sum rate in the multiuser MIMO downlink, also known as 
the broadcast channel (BC), was first proposed in [1]; the authors prove that Costa's dirty paper 
coding (DPC) strategy [2] is sum capacity achieving for a pair of single- antenna users. The sum- 
rate optimality of DPC was generalized to an arbitrary number of multi-antenna receivers using 
the notions of game theory [3] and uplink-downlink duality [4], [5]; this duality is employed 
in [6], [7] to derive iterative solutions that find the sum capacity. DPC has been shown to be 
the optimal precoding strategy not only for sum capacity, but also for the entire capacity region 
in the BC [8]. Unfortunately, finding a practical realization of the DPC precoding strategy has 
proven to be a difficult problem. Existing solutions, which are largely based on Tomlinson- 
Harashima precoding (THP) [9]— [12], incur high complexity due to their nonlinear nature and 
the combinatorial problem of user order selection. THP-based schemes also suffer from rate loss 
when compared to the sum capacity due to modulo and shaping losses. 

Linear precoding provides an alternative approach for transmission in the MIMO downlink, 
trading off a reduction in precoder complexity for suboptimal performance. Orthogonalization 
based schemes use zero forcing (ZF) and block diagonalization (BD) to transform the multiuser 
downlink into parallel single-user systems [13], [14]. A waterfilling power allocation can then 
be used to allocate powers to each of the users [15]. The simplicity of these approaches comes 
at the expense of an antenna constraint requiring at least as many transmit antennas as the 
total number of receive antennas. These schemes, therefore, restrict the possibility of gains 
from additional receiver antennas. The constraint is relaxed under successive zero forcing [16], 
which requires only partial orthogonality but incurs higher complexity in finding an optimal 
user ordering. Coordinated beamforming [17] and generalized orthogonalization [18] are able to 
avoid the antenna constraint via iterative optimization of transmit and receive beamformers. 

It is also possible to improve the sum rate achieved with ZF and BD by including user or 
antenna selection in the precoder design. The sum-rate maximizing ZF precoder can be found 
by comparing precoders for all possible subsets of available receive antennas [1]; however, this 
strategy incurs exponential complexity on the order of the total number of receive antennas. 
Greedy and suboptimal strategies for user selection [19]-[22] may also be applied with lower 
computational cost. However, user selection is outside the scope of this paper; our goal here 
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is to focus on the rates achievable under linear precoding. While all of these schemes possess 
lower complexity than the THP based methods, the use of orthogonalization results in suboptimal 
performance due to noise enhancement. In this paper, we consider the optimal formulation for 
sum rate maximization under linear precoding. 

Much of the existing literature on linear precoding for multiuser MIMO systems focuses on 
minimizing the sum of mean squared errors (SMSE) between the transmitted and received signals 
under a sum power constraint [23]-[28]. An important recurring theme in most of these papers 
is the use of an uplink-downlink duality for both MSE and signal-to-interference-plus-noise ratio 
(SINR) introduced in [24] for the single receive antenna case and extended to the MIMO case 
in [26], [27]. These MSE and SINR dualities are equally applicable to sum rate maximization. 

Linear precoding approaches to sum rate maximization have been proposed for both single- 
antenna receivers [29], [30] and for multiple antenna receivers [31]— [33]. In [29], the authors 
suggest an iterative method for direct optimization of the sum rate, while [30] and [31] exploit 
the SINR uplink-downlink duality of [24], [26], [27]. In [32] and [33], two similar algorithms 
were independently proposed to minimize the product of the mean squared errors (PMSE) in 
the multiuser MIMO downlink; these papers showed that the PMSE minimization problem is 
equivalent to the direct sum rate maximization proposed in [29]— [31]. The work of [33] was 
motivated by the equivalence relationship developed between the single user minimum MSE 
(MMSE) and mutual information in [34]. Each of the approaches in [29]-[33] yields a suboptimal 
solution, as the resulting solutions converge only to a local optimum, if at all. 

Given this prior work in linear precoding, an important motivation for this paper is to determine 
the performance upper bound achievable under linear precoding and to evaluate how closely 
PMSE minimization comes to approaching this upper bound. In the single-user multicarrier 
case, minimizing the PMSE is equivalent to minimizing the determinant of the MSE matrix 
and thus is also equivalent to maximizing the mutual information [35]. This equivalence does 
not apply to the multiuser scenario. In this paper, we investigate the relationship between the 
MSE-matrix determinants, the mutual information, and the maximum achievable sum rate under 
linear precoding in the multiuser MIMO downlink, resulting in an optimization problem based on 
minimizing the product of the determinants of all users' MSE matrices (PDetMSE). Furthermore, 
we underline the differences between the joint (multi- stream) optimization that arises from the 
PDetMSE approach and the scalar (per-stream) PMSE-based solution. While chronologically, 
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the PMSE approach was developed before the PDetMSE formulation, we present PMSE in this 
paper as a lower complexity approximation of the PDetMSE formulation. 
The main contributions of this paper are: 

• Deriving the maximum achievable information rates for both joint and scalar processing 
under linear precoding and formulating the joint (PDetMSE) and scalar processing (PMSE) 
based sum rate maximization problems using MSE expressions. 

• Proposing solutions to these optimization problems based on uplink-downlink duality, and 
addressing several issues regarding algorithm implementation. 

• Analyzing the performance of our proposed schemes in comparison to the DPC sum capacity 
and to orthogonalization based approaches. We demonstrate that a performance improvement 
is made in narrowing the gap to capacity at practical values of transmit SNR, and show 
that the PDetMSE approach provides the best performance of all proposed schemes. 

The remainder of this paper is organized as follows. Section [U describes the system model 
used and states the assumptions made. Section [TIT] derives the performance upper bound for 
the achievable sum rate under linear precoding, and develops the use of the product of MSE 
matrix determinants as the optimization criterion for joint processing. Section [IV] investigates 
a suboptimal framework based on the product of mean squared errors and proposes a compu- 
tationally feasible scheme for implementation. Results of simulations testing the effectiveness 
of the proposed approaches are presented in Section |V] Finally, we draw our conclusions in 
Section [VD 

Notation: Lower case italics, e.g., x, represent scalars while lower case boldface type is 
used for vectors (e.g., x). Upper case italics, e.g., N, are used for constants and upper case 
boldface represents matrices, e.g., X. Entries in vectors and matrices are denoted as [x]^ and 
[X] i . respectively. The superscripts T and H denote the transpose and Hermitian operators. E[-] 
represents the statistical expectation operator while Ijy is the N x N identity matrix, tr [•] and 
det (■) are the trace and determinant operators. || x|| j and ||x|| 2 denote the 1-norm (sum of entries) 
and Euclidean norm, diag(x) represents the diagonal matrix formed using the entries in vector x, 
and diag [Xi, . . . , X&] is the block diagonal concatenation of matrices Xi, . . . , X&. A y and 
B y indicate that A and B are positive definite and positive semidefinite matrices, respectively. 
e max (A, B) is the unit Euclidean norm eigenvector x corresponding to the largest eigenvalue A 
in the generalized eigenproblem Ax = ABx. Finally, CAf(m, a 2 ) denotes the complex Gaussian 
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Fig. 1. Processing for user k in downlink and virtual uplink. 



probability distribution with mean m and variance a 2 . 

II. System Model with Linear Precoding 

The system under consideration, illustrated in Fig. \T\ comprises a base station with M antennas 
transmitting to K decentralized users over flat wireless channels. User k is equipped with N k 
antennas and receives Lk data streams from the base station. Thus, we have M transmit antennas 
transmitting a total of L = Ylk=i Lk symbols to K users, who, together, have a total of iV = 
J2k=i Nk receive antennas. The data symbols for user k are collected in the data vector x fc = 
[iEfci, Xfc 2 , • • • , %kL k ] and the overall data vector is x = xf , x^, . . . , x^ . We assume that the 
modulated data symbols x are independent with unit average energy (E xx H = 1^). User fc's 
data streams are processed by the M x Lk transmit filter = [u^i, . . . , u^J before being 
transmitted over the M antennas; is the precoder for stream j of user k, and has unit 
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power ||ujy|| 2 = 1. Together, these individual precoders form the M x L global transmitter 

precoder matrix U = [Ui, U 2 , . . . , U^]. Let pkj be the power allocated to stream j of user 

k and the downlink transmit power vector for user k be Pfc = \pki,Pk2, ■ ■ ■ ,PkL k ] T , with p = 
r i 71 

pj , . . . , p^ . Define P k = diag{pfc} and P = diag{p}. The channel between the transmitter 
and user k is represented by the N k x M matrix H^. The overall N x M channel matrix is 
H H , with H = [Hi, H 2 , . . . , Hg-]. The transmitter is assumed to know the channel perfectly. 
Based on this model, user k receives a length- N k vector 

y fc = HfWPx+ nifc , 

where n k consists of the additive white Gaussian noise (AWGN) at the user's receive antennas 
with i.i.d. entries [n^ ~ CA/"(0, a 2 ); that is, E n k n k = a 2 \ Nk . To estimate its L k symbols x fe , 
user k processes with its L k x Nk decoder matrix resulting in 

xf L = VfHfU^x + Vfn* l 

where the superscript DL indicates the downlink. The global receive filter V H is a block diagonal 
matrix of dimension L x N, V = diag [V 1; V 2 , ■ • • , ~Vk), where each V fc = [v fcl , . . . , v feLfe ]. 
The MSE matrix for user k in the downlink under these general precoder and decoder matrices 
can be written as 

(x fc - X fc ) (x fc - X fc ) 



= Vf Hf UPU H H fe V fc + a 2 Vf V fc (1) 

- vf Hf u fe yr^ - y/pTuf H fc V fc + I Lfc . 

We will make use of the dual virtual uplink, also illustrated in Fig. [H with the same channels 
between users and base station. In the uplink, user k transmits L k data streams. Let the uplink 
transmit power vector for user k be q k = [q k i, qk2, ■ ■ ■ , qkL k } T , with q = [qf , . . . , q^] T , and 
define = diag{qfc} and Q = diag{q}. The transmit and receive filters for user k become Vfc 
and respectively. As in the downlink, the precoder for the virtual uplink contains columns 
with unit norm; that is, ||v/-j|| 2 = 1. The received vector at the base station and the estimated 
symbol vector for user k are 

K 



y = H * V i v Qi X * + n > 

i=l 

K 

^UfHiViJQ-x. + Ufn. 



r UL _ 

i=l 
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The noise term, n, is again AWGN with E 
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We define a useful virtual uplink receive covariance matrix as 
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The global MSE matrix for all users in the virtual uplink can then be expressed as 
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III. Linear Precoding and Sum Rate Maximization 



In this section, we formulate the sum rate maximization problem under linear precoding in 
the broadcast channel. We begin by introducing the information theoretic DPC upper bound, and 
then derive the performance upper bound achievable under linear precoding. We then derive an 
equivalent formulation in terms of MSE expressions, and propose the PDetMSE based scheme 
for achieving this optimal sum rate performance under linear precoding. 



A. Sum Capacity and Dirty Paper Coding 

Information theoretic approaches characterize the sum capacity of the multiuser MIMO down- 
link by solving the sum capacity of the equivalent uplink multiple access channel (MAC) and 
applying a duality result [4], [5]. The BC sum capacity can thus be expressed as 

R sum = maxlogdet ( I + \ ^ H fe S fc Hf ) 

Sfc V ° k=i J 

s.t. S fc y 0, k = 1,...,K 

K 

^ tr [Sfe] < P maX) 
k=l 

where is the uplink transmit covariance matrix for mobile user k, and P max is the maximum 
sum power over all users. Note that this optimization problem is concave in and is hence 
relatively easy to solve. This result does not translate to linear precoding. 



s 



B. Achievable Sum Rate under Linear Precoding 

The achievable rate for a single user MIMO channel is log (det (K x + K 2 ) / det (K 2 )) (where 
K x is the received signal covariance and K 2 is the noise co variance) [36]. Under single-user 
decoding, multi-user interference is treated as noise, and user k can achieve rate R k in the 
downlink using transmit covariance H k : 

D , det (Ef =1 Hf S,H fc + <x 2 l) 

R k = log 7 r-. 

det(E, ¥fe HfE i H ifc + ( 7 2 l) 
Under the system model described in Section [III user k transmits with covariance matrix 
Sfc = UfcPfcU^. The achievable rate for user k under linear precoding is therefore 

det(Ef =1 HfU,P i UfH fe + ( T 2 l) 

i2 fc = log 
= log 



det (E,^ fc Hf Uj-PjUf H fe + a 2 l) (3) 
det Jfc 



det Rjv+i",fc ' 

where 3 k = H^UPU^H fc + a 2 l and K N+Itk = J k — UfePfcU^H/; are the received signal 
covariance matrix and the noise-plus-interference covariance matrix at user k, respectively. 

The rate maximization problem with a sum power constraint under linear precoding can then 
be formulated as 

(U, P) = arg max log ^ ^ k — 
1 ' u < p fc =i detIW, fc 

s.t. ||u fei || 2 = 1, k = l,...,K, j = l,...,L k 
p kj >0, k = l,...,K, j = l,...,L k 

K L k 

l|Pl|l = EE^i - -Pmax- (4) 
k=lj=l 

C. MSE Formulation: Product of MSE Matrix Determinants 

In this section, we show that an MSE-based formulation using joint processing of all streams 
(rather than treating each user's own data streams as interference) leads to an equivalent op- 
timal formulation of the rate maximization problem under linear processing. We develop this 
relationship by using the MSE matrix determinants. 



First, consider the linear MMSE decoder for user k, V k , 



V, = ( H?UPU H H fc + a 2 l) 1 Hf IL 



(5) 
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When using this matrix as the receiver in (OQ), the downlink MSE matrix for user k in can be 
simplified as 

E° L = I Lk - ^Uf H^Hf U*V^- (6) 
Consider the following optimization problem which minimizes the product of the determinants 
of the downlink MSE matrices under a sum power constraint: 



K 



(U,P) = argminJ|detE 



DL 
k 



s.t. ||ujbj|| 2 = 1, k = l,...,K, j = l,...,L k 
Pkj>0, k = 1, . . . , K, j = 1, . . . , L k 

K L k 

Hp IK = EE% - Pmax - ^ 

fc=ii=i 

Theorem 1: Under linear MMSE decoding at the base station, the sum rate maximization 
problem in © and the PDetMSE minimization problem in © are equivalent. 
Proof: The determinant of the downlink MSE matrix can be written as 

det Ef L = det (l Lfc - Hf U fc P fc Uf H fc 3 k x ) (8) 

= det 



HfU fe P fc UflV T - J 



det 



det Rat+z, 



det Jfe 

where ([8]) follows from © since det (I + AB) = det (I + BA) when A and B have appropriate 
dimensions. We then see the relationship to ©, 

, ^nr -, detJ fc 

logdetE^ L = -log- 



det Rjv+/,fc 

- ~ K k • 

With this result, we can see that under MMSE reception using as defined in ©, minimizing 
the determinant of the MSE matrix E^ L is equivalent to maximizing the achievable rate for user 
k. It follows that minimizing the product of MSE matrix determinants over all users is equivalent 
to sum rate maximization, 

K K 

min det Ef L = min J2 log det Ef L (9) 

k=l k=l 
K 

= max ^2 Rk F - 

k=x 
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where © holds since since log(-) is a monotonically increasing function of its argument. ■ 

Note that this new result represents an upper bound on the sum rate on all linear precoding 
schemes in the broadcast channel. 

The covariance matrices J k and H N+Ijk in the MSE matrix E k are each functions of all 
precoder and power allocation matrices. Thus, the sum rates R k for each user k (and the sum 
rate for all users) are coupled across users. As such, finding U and P jointly or finding only 
the power allocation P for a fixed U are both non-convex problems and are just as difficult to 
solve as the rate maximization problem. 

In the sum capacity and SMSE problems, the problem of non-convexity is addressed by solving 
a convex virtual uplink formulation and applying a duality-based transformation. Unfortunately, 
the sum rate expression under linear precoding in the virtual uplink is nearly identical to that 
derived above for the downlink, and does not admit a cancellation or grouping of terms to 
decouple the problem across users. 

Direct solution of the non-convex downlink problem for minimizing the product of MSE matrix 
determinants requires finding a complex M x L precoder matrix. We consider the application 
of sequential quadratic programming (SQP) [37] to solve this problem. SQP solves successive 
approximations of a constrained optimization problem and is guaranteed to converge to the 
optimum value for convex problems; however, in the case of this non-convex optimization 
problem, SQP can only guarantee convergence to a local minimum. 

This computationally intensive approach is the only available option in the absence of a 
convex virtual uplink formulation. Moreover, the numerical techniques used for solving nonlinear 
problems do not guarantee convergence to the global minimum. This is clearly not a desirable 
method for finding a practical precoder, especially when one of our major motivations for using 
linear precoding is reducing transmitter complexity. We do not suggest that this method be 
practically implemented; rather, we use it to illustrate the difference in performance between the 
solutions to the optimal PDetMSE formulation and the more practical PMSE algorithm that we 
propose in the following section. 

IV. Scalar Processing and the Product of Mean Squared Errors 

Given the complexity of the PDetMSE solution, we consider PMSE minimization as a subop- 
timal (but likely feasible) approximation to rate maximization in the multiuser MIMO downlink. 



11 



In [35], the single-user rate maximization problem using linear precoding is solved by minimizing 
the determinant of the MSE matrix. This solution is equivalent to minimizing the product of 
individual stream MSEs because the problem is scalarized by diagonalization of both the channel 
and MSE matrices. It was recently demonstrated in [38] that the MSE matrices can also be 
diagonalized in the multiuser case by applying unitary transformations to the precoder and 
decoders; however, in the absence of orthogonalizing precoders (e.g., BD or ZF), minimization 
of the PMSE yields a different solution from minimizing the PDetMSE. 

The PMSE approach, based on scalar processing of the individual stream MSEs for each 
user, follows from the treatment of the optimization problems in [26], [27], where non-convex 
problems in the downlink are transformed to convex problems in the dual uplink. With this 
motivation in mind, we consider formulating the scalar optimization problem directly in the 
virtual uplink, and transforming the resulting solution to the downlink using the uplink-downlink 
MSE duality in [26], [27]. 

A. Achievable Sum Rate using Scalar Processing 

In the scalarized version of the rate maximization problem, the user's own data streams (I ^ j) 
are considered as self-interference in addition to the multiuser interference. The achievable rate 
for user k's substream j can thus be expressed as 

i^ = l°g(l + 7S L ), 

where 

UL _ U kj H k^kjqkj^kj H k u kj 10 
U kj J kjUkj 

is the SINR and J k j = 3 — H^v^^-v^-H;^ is the virtual uplink interference-plus-noise covari- 
ance matrix for stream j of user k. 

The scalar rate maximization problem with a sum power constraint under linear precoding 
can thus be written as 

(V,Q) = argmaxf;X>g(l + 7& L ) 
M k=ij=i 

s-t. ||v fc j|| 2 = l, k = l,...,K, j = l,...,L k 
q kj >0, k=l,...,K, j = l,...,L k 

K L k 

k=l j=l 



12 



B. MSE Formulation: Product of Mean Squared Errors 

With this scalar processing rate maximization problem in mind, we consider the MSE-equivalent 
formulation. We begin by finding the optimum linear receiver, and can see from (flOl) that u k j 
does not depend on any other columns of U. Furthermore, it is the solution to the generalized 
eigenproblem 

opt ~ /tt HttH t A 

U kj - e max [tl k V kj q k jV kj tt k , J k j) ■ 

Within a normalizing factor, this solution is equivalent to the MMSE receiver, 

u fcf = J~ 1 H fc v fcjv /^~. (12) 



When the MMSE receiver in (PT21) is used, the virtual uplink MSE matrix © reduces to 

E UL = I L - y / QV // H H J- 1 HVy / Q. 

Thus, the mean squared error for user fc's j th stream is entry j in block k of E UL , 

Now consider another optimization problem, minimizing the product of mean squared errors 
(PMSE) under a sum power constraint, 

K L k 



(V,Q) = argminnn^ 



fc=i j=i 

s-t. ||v fcj || 2 = l, k = l,...,K, j = l,...,L k 
q k j>0, k — 1, . . . ,K, j = 1, . . . , L k 

K L k 

|| q|| 1 = J2 J2lkj < ^max- (13) 
k=lj=l 

Theorem 2: Under linear MMSE decoding at the base station, the optimization problems 
defined by (fTTI) and ( IT3T ) are equivalent. 

Proof: Using (flOl) . we can rewrite 1 + jK L as 



H T 

i _j_ AJL _ u kj Ju kj 

u kj Ju kj - u kj tl k v kj q kj v kj il k U k j 
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It follows that by using the MMSE receiver from (PT21) . 

1 _ l _ u kj Hfc Vfcj qkj Vkj Hf Ufcj 



1+7^ ugJu fci 

ftffcivgHjfJ^HfcVjy) 2 (14) 



1 



^•vgH^J-iHfcVy 



= 1 - g fcj vg-Hf J _1 H fc v fcj = e fcj - . 
This relationship is similar to one shown for MMSE detection in CDMA systems [39]. By 
applying (fl4l) to ([111) . we see that 

EEio g (i+7& i ) = -iogfnn^ i 

fc=ij=i \fc=ij=i 
Since the constraints on vjy and g&j are identical in (fTTT) and (fT3l . the problem of maximizing 

sum rate in (fTTj) is therefore equivalent to minimizing the PMSE in (fT3l . ■ 

Note that this result has been independently derived in [32], [33]. 

C. Algorithm: PMSE Minimization 

We now present an algorithm that minimizes the product of mean squared errors. The algorithm 
draws upon previous work based on uplink-downlink MSE duality [26], [27], which states that all 
achievable MSEs in the uplink for a given U, V, and q (with sum power constraint ||q||i < P max ), 
can also be achieved by a power allocation p in the downlink (where || p || ! < P max ). It operates 
by iteratively obtaining the downlink precoder matrix U and power allocations p and the virtual 
uplink precoder matrix V and power allocations q. Each step minimizes the objective function 
by modifying one of these four variables while leaving the remaining three fixed. 

1 ) Downlink Precoder: For a fixed set of virtual uplink precoders Vfe and power allocation 
q, the optimum virtual uplink decoder U is defined by (fT2~l) . Each is minimized individually 
by this MMSE receiver, thereby also minimizing the product of MSEs. This U is normalized 
and used as the downlink precoder. 

2) Downlink Power Allocation: The downlink power allocation p is given by [27]: 



where ^ is the L x L cross coupling matrix defined as 



hf Uj-I 2 = |uf hj| 2 i^j 



1=3 
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D = diag 




..Huff., 

v ii"i u n 



2' ' ' ' ' 



K 



2 



where H = HV = [h x , . . . , h L ], U = [u 1; . . . , u L ], and 1 is the all-ones vector of the required 
dimension. 

3) Virtual Uplink Precoder: Given a fixed U and p, the optimal decoders are the MMSE 
receivers: 



In this equation, J k = H^UPU H H fc + <J 2 ^-N k is the receive covariance matrix for user k. The 
optimum virtual uplink precoders are then the normalized columns of V^. 

4) Virtual Uplink Power Allocation: The power allocation problem on the virtual uplink solves 
(TTSl with a fixed matrix V. While it is well accepted that the power allocation subproblem in 
PMSE minimization (or equivalently, in sum rate maximization) is non-convex [30], [31], [40], 
recent work [32] has shown that the optimal power allocation can be found by formulating 
the subproblem as a Geometric Programming (GP) problem [41]. A similar approach was 
proposed in [31], where iterations of the the sum rate maximization problem are solved by local 
approximations of the non-convex sum rate function as a GP. We employ numerical techniques 
(SQP) to solve the power allocation subproblem. 

In summary, the PMSE minimization algorithm keeps three of four parameters (U, p, V, q) 
fixed at each step and obtains the optimal value of the fourth. Convergence of the overall 
algorithm to a local minimum is guaranteed since the PMSE objective function is non-increasing 
at each of the four parameter update steps. Termination of the algorithm is determined by the 
selection of a convergence threshold e. 

Since the overall minimization problem (fT3l) is not convex, all of the suggested methods are 
guaranteed to converge only to a local minimum. Nonetheless, simulations suggest that the locally 
optimal value of the sum rate is not overly sensitive to selection of an appropriate initialization 
point. It is important to ensure that the initial solution allocates power to all L substreams, as 
the iterative algorithm tends to not allocate power to streams with zero power. A reasonable 
initialization is to select random unit-norm precoder vectors in U and uniform power allocated 
over all substreams. A summary of our proposed algorithm can be found in Table HI 
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TABLE I 

Iterative PMSE minimization algorithm 



Iteration: 

1- Downlink Precoder 
U fe = J^Hf V fc ^QT 

2- Downlink Power Allocation via MSE duality 
p = cr 2 (D 1 

3- Virtual Uplink Precoder 

V k = J?H?V k ytt. v fej = ^ 

4- Virtual Uplink Power Allocation 

q = argmin q ]Tf =1 Il^i S - L ?W ^ °> Nil ^ p ^ 

5- tfepeaf 1^4 until [PMSE oM - PMSE ncw ] /PMSE oM < e 



V. Numerical Examples 

In this section, we present simulation results to illustrate the performance of the proposed 
algorithms. In all cases, the fading channel is modelled as flat and Rayleigh, with i.i.d. channel 
coefficients distributed as £A/"(0, 1). The examples use a maximum transmit power of P max = 1; 
SNR is controlled by varying the receiver noise power a 2 . As stated earlier, the transmitter is 
assumed to have perfect knowledge of the channel matrix H. 

A. Sum Capacity and Achievable Sum Rate 

We first compare the sum rate achievable using linear precoding to the information theoretic 
capacity of the BC. That is, we consider the spectral efficiency (measured in bps/Hz) that could 
be achieved under ideal transmission by drawing transmit symbols from a Gaussian codebook. 
Figure [2] illustrates how the proposed schemes perform when compared to the sum capacity 
for the broadcast channel (i.e., using dirty paper coding (DPC) [2]) and to linear precoding 
methods based on channel orthogonalization, i.e., block diagonalization (BD) and zero forcing 



Ufc, 
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5 10 „ 15 

SNR=P /a 2 (dB) 

may v ' 



Fig. 2. Comparing PDetMSE, PMSE, DPC and orthogonalization-based methods, K = 2, M = 4, N k = 2, L k = 2 

(ZF) [15]Q The convergence threshold for the PMSE algorithm is set at e = 10~ 6 . Note that 
curves for THP can not be included for comparison, as the modulo and shaping losses from the 
DPC sum capacity are fundamentally related to THP's nonlinear modulation scheme. 

The simulations in Fig.[2]model a K — 2 user system with M = 4 transmit antennas and Nk = 
2 receive antennas per user. We see a negligible difference in performance when comparing the 
PDetMSE algorithm to the PMSE solution. This is interesting because the relationship between 
PDetMSE and PMSE mirrors that of BD and ZF; that is, the PDetMSE can be viewed as the 



'Simulation results for the DPC, BD, ZF, and NuSVD plots were obtained by using the cvx optimization package [42], [43]. 
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Fig. 3. Comparing PDetMSE, PMSE, DPC and orthogonalization-based methods, K = 2, M = 4, N k = 4, L k = 2 



block-matrix formulation of the PMSE problem. There is, however, a significant performance 
difference between BD and ZF. This result is also gratifying because it suggests that the marginal 
gains achieved by joint processing do not merit the greatly increased computational complexity; 
the feasible PMSE solution can be used without a large penalty in performance. The PMSE 
and PDetMSE algorithms do demonstrate a divergence in performance from the theoretical DPC 
bound at higher SNR. This drop in spectral efficiency may reflect a fundamental gap between 
the (optimal) nonlinear DPC capacity and the rate achievable under linear precoding, but it may 
also be caused by the algorithms' convergence to local minima due to the non-convexity of the 
optimization problems. 
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The PMSE algorithm outperforms the BD and ZF methods over the entire SNR range when 
the orthogonalization-based schemes are forced to use all N receive antennas. However, this this 
approach to orthogonalization is suboptimal; the optimal BD and ZF precoders may be found 



At high SNR, the PMSE and PDetMSE precoders perform equivalently to the BD precoder with 
selection; we have observed that the PMSE and PDetMSE precoders (in conjunction with the 
MMSE receivers) behave like the BD precoder in orthogonalizing the channel at high SNR. The 
biggest gain in performance over orthogonalization-based solutions occurs at low to mid-SNR 
values, where BD and ZF suffer due to noise enhancement. 

Figure [3] presents simulation results for a similar system as Fig. [2l but with Nk = 4 receive 
antennas per user. In this system, there are fewer transmit antennas than receive antennas 
(M < N), so BD/ZF can not be employed without selection. We include simulation results 
for BD/ZF with selection, but note the large computational complexity required (selecting the 
best of 162 candidate precoders). We compare these results to a generalized orthogonalization 
based approach, referred to as nullspace-directed SVD (NuSVD) in [18], and observe a large 
difference in performance at high SNR. This gain in spectral efficiency can be attributed to 
NuSVD's ability to use all N = 8 receive antennas, whereas BD and ZF are limited by an 
antenna constraint. 

Once again, Fig. [3] illustrates that the PMSE/PDetMSE approaches outperform orthogonal- 
ization, particularly at low to mid-SNR values. This improvement in performance comes at 
the expense of additional complexity. Even though NuSVD and PMSE/PDetMSE are iterative 
algorithms, NuSVD requires only one (concave) waterfilling power allocation after convergence 
of precoder direction iterations, whereas the PMSE/PDetMSE minimization methods employ 
numerical optimization algorithms (SQP) in each iteration. 

Figure @] shows how the sum throughput scales with the number of users K, for M = 2K 
transmit antennas and N k = 2 receive antennas per user at 5 dB average SNR. The number of 
transmit antennas M is chosen so that BD and ZF can be implemented without selection, as 
selection-based BD and ZF are exponentially complex with A K — 1 possible precoders. This plot 
illustrates how the proposed scheme takes advantage of the available degrees of freedom at the 
transmitter and provides throughput significantly better than the orthogonalization based BD and 




possible subsets of receive antennas. 
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Fig. 4. Scaling of sum rate with K, M = 2K, N k = L k = 2, SNR = 5dB 



ZF schemes. 

The PMSE and PDetMSE algorithms do not require the explicit selection of rather, this 
parameter is determined implicitly by the power allocation. However, we can force the PMSE 
algorithm to allocate a maximum number of substreams L k to each user to gain further insight 
into its behaviour. In Fig. [51 the number of streams in the Nk = 4 system described above is 
varied from L\ = L 2 = 2 to L\ = 3 and L 2 = 1. The achievable sum rate in this system 
decreases in the latter case, as the asymmetric stream allocation does not correspond to the 
symmetric (statistically identical) channel configuration. In this case, user 2 is restricted to only 
a single data stream, and thus can not take full advantage of good channel realizations. If the 
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Fig. 5. Data stream allocation in PMSE optimization, K — 2, M = 4, N k — 4 

goal is always maximizing the sum rate, the users should be allocated the maximum number of 
data streams in as balanced a manner as possible. Note however that the PMSE algorithm can 
provide unbalanced allocations if desired for other reasons (e.g., quality of service provisioning). 

B. Implementation: Adaptive Modulation 

In contrast to the previous results based on Gaussian codebooks, we now consider the selection 
of constellations for modulation to achieve a maximum throughput for a specified bit error rate 
(BER) target of (3 kj on user k's j th substream. Since the PMSE algorithm assumes unit energy 
symbols, we use M-PSK constellations in our implementation. Note that the prior assumption 
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of Gaussian noise-plus-interference still holds for a sufficient number of interferers under the 
central limit theorem. We propose two approaches for selecting the modulation scheme for each 
substream. 

1) Naive Approach: This approach selects the largest PSK constellation of fyy bits per stream 
that satisfies the required BER constraint. The constraint is satisfied using a closed form BER 
approximation [44], 

BER PSK (7) ~ C! exp L~° 27 ) , (15) 



.2 c 3 fo - c 4 

where M = 2 b is the size of the PSK constellation. We apply the least aggressive of the bounds 
proposed in [44] by using the values C\ = 0.25,c 2 = 8,c 3 = 1.94, and c 4 = 0. We note that this 
approximation only holds for b > 2; as such, one can use the exact expression for BPSK: 

BER B Ps K (7) = 2 erfc ^- (16) 
The BPSK expression can be used as a test of feasibility for the specified BER target; if the 
resulting BER under BPSK modulation is higher than (3kj, then we have two options: either 
declare the BER target infeasible, or transmit using the lowest modulation depth available (i.e. 
BPSK). In this work, we have elected to transmit using BPSK whenever the PMSE stage has 
allocated power to the data stream. 

2) Probabilistic Approach: The naive approach is quite conservative in that there may be a 
large gap between the BER requirement and BERf, fc , the BER achieved for each channel real- 
ization. We suggest a probabilistic bit allocation scheme that switches between b k j bits (as deter- 
mined by the naive approach) and b k j + l bits with probability p k j = (3 k j — BER fefcj / BER fefcj+1 — BER fefcj 
This modulation strategy may not be appropriate for systems requiring instantaneous satisfaction 
of BER constraints; however, the probabilistic method will still achieve the desired BER in the 
long-term over multiple channel realizations. 

Figure [6] shows the sum rate achieved in the same system configuration as in Fig. [2] (K = 2, 
M = 4, N k = 2) under the M-PSK modulation scheme described above. The simulations 
use two data streams per user and a target bit error rate of (3 k j = 10~ 2 ; 5000 data and noise 
realizations are used for each channel realization. The plot illustrates the average number of bits 
per transmission for user 1 ; due to symmetry, the corresponding plot for user 2 is identical. Note 
that in contrast to the previous results based on Gaussian coding using spectral efficiency, the 
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Fig. 6. Sum rate vs. SNR for user 1 with M-PSK modulation, K = 2, M - 4, N k - L k = 2 



sum rate in Fig. [6] is the average number of bits transmitted per realization using symbols from 
a PSK constellation. 

In Fig. [6l we consider using the PSK modulation scheme for the PMSE precoder and the 
SMSE precoder designed in [27]. Examination of this plot reveals that using the PMSE criterion 
is justified at practical SNR values with improvements of approximately one bit per transmission 
near 15 dB. Furthermore, using the probabilistic modulation scheme (designated "PMSE-P") 
yields an additional improvement of more than half a bit per transmission across all SNR values. 

In Fig. [71 we plot average BER versus SNR for the same system configuration as in Fig. [6] 
This plot illustrates how the naive bit allocation algorithm attempts to achieve the target BER 
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Fig. 7. BER vs. SNR for user 1 with M-PSK modulation, K = 2, M = 4, N k = L k = 2 

of 10~ 2 for all data streams under PMSE, but also overshoots the target, converging to a BER 
of approximately 5 x 10 4 . This can be attributed to the looseness of the BER bound mentioned 
above. In contrast, the probabilistic rate allocation algorithm not only increases the rate, as 
shown in Fig. [6l but also converges to a BER that is much closer to the desired target BER. The 
remaining gap between the actual BER achieved and the target BER can again be attributed to 
looseness in the approximations of (TT5T) and (TT6l) . 

VI. Conclusions 

In this paper, we have considered the problem of designing a linear precoder to maximize 
sum throughput in the multiuser MIMO downlink under a sum power constraint. We have 
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compared the maximum achievable sum rate performance of linear precoding schemes to the 
sum capacity in the general MIMO downlink, without imposing constraints on the number of 
users, base station antennas, or mobile antennas. The problem was reformulated in terms of 
MSE based expressions, and the joint processing solution based on PDetMSE minimization 
was shown to be theoretically optimal, but computationally infeasible. A suboptimal framework 
based on scalar (per-stream) processing was then proposed, and an implementation was provided 
based on PMSE minimization and employing a known uplink-downlink duality of MSEs. We 
evaluated the performance of these schemes in the context of orthogonalizing approaches, which 
suffer from noise enhancement, and have shown that the MSE based optimization schemes are 
able to achieve significant performance improvements. Furthermore, we have demonstrated that 
negligible performance losses occur when using the suboptimal PMSE criterion in comparison 
to the optimum PDetMSE criterion. 
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